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ABSTRACT  1 

This  pacer  describes  circuits  fo"  computation  of  a  large  class  of  algebraic 

functions  on  polynomials,  power  series,  and  integers,  for  which  it  has  been  a 

2 

long  standing  open  problem  to  compute  in  depth  less  than  £  (log  n) 

Algebraic  circuits  assume  unit  cost  for  elemental  addition  and  multiplication. 

This  paper  describes  O(log  n)  depth  algebraic  circuits  which  given  as  input  the 

coefficients  of  r.  degree  polynomials  (over  an  appropriate  ring) ,  compute  the  - 

0(1) 

product  of  n  polynomials,  the  symmetric  functions,  as  well  as  division  and 

interpolation  of  real  polynomials.  Also  described  are  0(log  n)  depth  algebraic 
circuits  which  given  as  input  the  first  n  coefficients  of  a  power  series 
( over  an  appropriate  ring)  compute  the  product  of  n°^  power  series,  as  well 
as  division,  reciprocal  and  reversion  of  real  power  series. 

Furthermore  this  pacer  describes  boolean  circuits  of  depth  0  dog  ndoglog  n))  •  - : 

which  given  n-bit  binary  numbers,  compute  the  prcduc*  <■  c  n  numbers  and  integer 
division.  As  corollaries,  we  get  boolean  circuits  of  the  same  depth  for 
evaluating,  within  accuracy  2  n,  polynomials,  power  series,  ar.d  elementary 
functions  such  as  (fixed)  powers,  roots,  exponentiations,  logarithm,  sin  and 
cosine.  ■ 

All  these  circuits  have  constant  indegree,  polynomial  sice,  ar.d  may  be  ( 

1 

uniformly  constructed  by  a  deterministic  Turir.y  machine  with  space  0(lcg  n)  .  ] 

i  i 
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ABSTRACT 


This  paper  describes  circuits  for  computation  of  a  large  class  of  algebraic 

functions  on  polynomials,  power  series,  and  integers,  for  which  it  has  been  a 

2 

long  standing  open  problem  to  compute  in  depth  less  than  ft(log  n)  . 

Algebraic  circuits  assume  unit  cost  for  elemental  addition  and  multiplication. 
This  paper  describes  O(log  n)  depth  algebraic  circuits  which  given  as  input  the 
coefficients  of  n  decree  polynomials  (over  an  appropriate  ring) ,  compute  the 
product  of  n°^  polynomials,  the  symmetric  functions,  as  well  as  division  and 
interpola -ion  of  real  polynomials.  Also  described  are  O(log  n)  depth  algebraic 
circuits  which  given  as  input  the  first  n  coefficients  of  a  power  series 
(over  an  appropriate  ring)  compute  the  product  of  n°^  power  series,  as  well 
as  division,  reciprocal  and  reversion  of  real  power  series. 

Furthermore  this  paper  describes  boolean  circuits  of  depth  O(log  n(loglog  n)) 
which  given  n-bit  binary  numbers,  compute  the  product  of  n  numbers  and  integer 
division.  As  corollaries,  we  get  boolean  circuits  of  the  same  depth  for 
evaluating,  within  accuracy  2  n,  polynomials,  power  series,  and  elementary 
functions  such  as  (fixed)  powers,  roots,  exponentiations,  logarithm,  sin  and 
cosine . 

All  these  circuits  have  constant  indegree,  polynomial  size,  and  may  be 
uniformly  constructed  by  a  deterministic  Turing  machine  with  space  O(log  n) . 
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J.  INTRODUCTION 

Much  research  is  now  done  on  parallel  algorithms,  although  in  fact  at  this 
time  most  current  computers  contain  only  a  single  processor.  However,  these 
computers  do  use  parallel  circuits  to  implement  the  most  basic  and  often  repeated 
operations,  such  as  the  arithmetic  operations:  addition,  subtraction,  multiplica¬ 
tion  and  division.  These  operations  are  generally  applied  to  integers  with  an 
n  bit  binary  representation,  and  to  floating  point  reals  with  relative  accuracy 
2  n.  Other  frequently  used  repeated  operations,  which  certainly  would  merit 
special  purpose  circuits,  are  the  elementary  functions  such  as  sin,  cosine, 
arctangent,  exponentation,  logarithm,  square  roots,  and  fixed  powers.  For 
practical  reasons  we  require  circuits  of  constant  indegree  which  can  be  uniformly 
constructed  within  O(log  n)  deterministic  space  (and  thus  deterministic  polynomial 
time) . 

The  depth  of  a  circuit  is  the  time  for  its  parallel  execution.  What  is  the 
minimum  depth  of  boolean  circuits  for  these  arithmetic  operations  and  elementary- 
functions? 

For  integer  addition,  [Ofman,  62],  [Krapchenko,  67]  and  [Ladner  and  Fischer, 

80]  give  boolean  circuits  of  depth  0(log  n)  and  size  0(n).  Subtraction  cir¬ 
cuits  with  the  same  asymptotic  depth  and  size  can  easily  be  gotten  from  these 
addition  circuits.  Also  [Reif,  83]  has  recently  given  linear  size,  constant 
indegree  boolean  circuits  of  depth  Odoglog  n)  for  addition  and  subtraction  of 

_o  ^  j  ^ 

random  numbers  with  error  probability  at  most  n 

For  integer  multiplication,  [Ofman,  62]  and  [Wallace,  64]  give  boolean  cir¬ 
cuits  of  depth  0{log  n) ,  and  [Schonhage  and  Strassen,  71]  also  achieve  depth 
0(log  n)  with  simultaneous  size  0(n(log  n)loglog  n) . 


The  problem  of  computing  division  or  the  elementary  functions  in  better  then 
2 

depth  ft(log  n)  has  been  open  for  at  least  17  years  since  S,  Cook’s  Ph.D.  thesis 

[Cook,  66],  (also  see  [Borodin  and  Munro,  75],  and  [Savage,  76]).  [Wallace,  64] 

2 

first  gave  a  division  circuit  with  depth  .'{log  n)  .  Subsequently,  [Anderson  et  al. 

67]  gave  a  division  circuit  of  the  same  depth  which  was  implemented  by  them  on  the 

IBM/360  Model  91  Floating-Point  Execution  Unit.  [Knuth,  69]  and  [Aho,  Hopcroft 

and  Ullman,  74]  described  a  division  circuit  attributed  to  Steve  Cook  of  depth 
2 

(log  n)  and  size  0(n  log  n  loglog  n) .  The  best  known  boolean  circuit  depth  for 

2 

the  elementary  functions  was  ft (log  n)  (Brent,  76],  [Kung,  76].  Many  of  the 

2 

above  mentioned  boolean  circuits  of  depth  ft (log  n)  for  division  and  elementary 
functions  use  a  second  order  Newton  iteration  with  ft (log  n)  steps,  each  requiring 
an  n-bit  integer  multiplication  with  ft(log  r.)  depth.  Alternatively,  a  reduction 
is  often  made  to  the  problem  of  computing  the  m-th  power  of  a  n-bit  integer  modulo 
2  +1  for  m  =  0(n).  This  can  be  computed  by  ft(log  n)  steps  of  repeated  squaring, 
where  each  square  computation  requires  ft (log  n)  depth. 

2 

By  new  techniques  we  achieve  depth  less  than  (log  n)  .  An  essential 
technique  in  the  construction  of  our  circuits  is  the  use  of  convolutions,  which 
can  be  computed  in  boolean  depth  O(log  n)  by  the  fast  Fourier  transforms. 

This  technique  was  first  introduced  by  [Schonhage  and  Strassen,  71]  for  the 
multiplication  of  two  integers.  Our  innovation  was  to  generalize  the 
convolution  technique  to  products  of  more  than  two  terms. 

Section  2  introduces  the  appropriate  mathematical  groundwork  for  the 
generalized  polynomial  convolution  techniques  which  we  utilize.  Also  in 
Section  2  we  give  O(log  n)  depth  algebraic  circuits  for  various  polynomial 
and  power  series  operations.  These  algebraic  circuits  are  interesting  in  the 
theoretical  context  of  parallel  algebraic  computation,  where  arithmatic 
operations  arc  assumed  to  be  of  unit  cost. 
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The  last  part  of  this  paper  is  concerned  with  the  possibly  more  practical 

construction  of  boolean  circuits,  which  originally  motivated  this  work.  In 

Section  3  we  give  uniform  boolean  circuits  of  nearly  logarithmic  depth  for  the 

problem  computing  the  product  of  n° ^  integers  modulo  (2n + 1) .  In  an 

earlier  version  of  this  paper  [Reif,  83]  we  proved  our  boolean  circuits  had 

2 

depth  0{log  ndoglog  n)  ).  This  draft  includes  an  improvement  to  our  construction 
due  to  Beame,  Cook,  and  Hoover  which  reduces  the  depth  by  a  factor  of  loglog  n 
to  0{log  ndoglog  n))  and  gives  simultaneously  polynomial  size.  These  results 
imply  uniform  boolean  circuits  of  depth  O(log  ndoglog  n))  for  the  problems  of 
division  and  computing  elementary  functions,  among  others. 

This  also  implies  sequential  space  complexity  upper  bounds  for  these  and 
related  problems.  In  particular,  [Borodin,  77]  proved  that  if  a  function  f  is 
computed  in  uniform  boolean  circuit  depth  D{n)  > log  n,  then  f  can  be  computed 
by  a  deterministic  Turing  Machine  with  space  D(n).  Thus  for  example,  division  and 
the  elementary  functionscan  be  computed  by  deterministic  space  Odog  ndoglog  n)  )  . 
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CIRCUITS  FOR  POLYNOMIAL  AND  POWER  SERIES  COMPUTATIONS 


Our  basic  techniques  are  best  understood  first  in  the  simpler  context  of  poly¬ 
nomials  and  power  series.  In  fact,  this  context  is  interesting  in  itself.  We 
might  envision  a  special  purpose  computer  designed  for  algebraic  computation. 

Its  data  are  (coefficients  of)  polynomials  and  power  series.  The  arithmetic 
operations  including  division  of  polynomials  and  power  series  are  elementary 
operations  of  our  "algebraic  computer”.  Also,  frequently  applied  operations  are 
the  composition  of  power  series,  revision  of  a  power  series,  computation  of 
elementary  functions  applied  to  power  series,  and  interpolation  of  polynomials. 

We  give  in  this  Section  circuits  of  depth  O(log  n)  for  all  these  polynomial 
and  power  series  operations,  where  each  gate  of  the  circuits  computes  an  addition, 
multiplication,  or  a  division  of  two  elements  of  the  domain. 

2.0  C i rcu i t  Def i n i t ions 

A  circuit  a  over  a  commutative  ring  Si-  (£2*,  +  ,*,0,1)  is  an  acyclic 
h 

labeled  digraph,  with 

(i)  a  list  of  N  distinguished  input  nodes  that  have  no  entering  edges 

(ii)  constant  nodes  with  indegree  0  and  labeled  with  constants  in  S 

(iii)  internal  nodes  with  indegree  two  and  labeled  with  the  symbols  in 
"•"} 

(iv)  a  list  of  k  distinguished  output  nodes. 

Given  an  assignment  of  the  input  nodes  from  domain  S,  the  value  of  the 

circuit  at  the  output  nodes  is  gotten  by  evaluation  of  the  gates  in  topological 

N  £ 

order.  The  circuit  thus  defines  a  mapping  from  to  S'  .  A  circuit 

a  over  a  field  is  similarly  defined,  except  the  internal  nodes  can  also  compute 
division.  Since  division  may  yield  an  undefined  value,  a  circuit  over  a  field 
defines  in  general  a  partial  mapping  of  inputs  to  outputs. 
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Let  d?[x]  be  the  polynomials  over  commutative  ring  Let  c#P[[xJ] 

be  the  power  series  over  &l. 

Let  f  be  a  partial  function  of  (the  coefficients  of)  m  polynomials 

p  (x),...,p  (x)  in  &[x]  of  degree  n-1.  A  circuit  a  for  f  has  N = mn 
1  m  N 

inputs,  namely  the  list  of  N  coefficients  in  O'  of  the  given  polynomials. 

The  output  nodes  of  give  the  list  of  coefficients  of  f  (p  (x)  , . . .  ,p  (x) )  . 

If  on  the  other  hand  f  is  a  function  of  m  power  series  p  (x) , . . . ,p^  (x)  in 

3 P[[x]]  each  with  n  given  lov:  order  coefficients,  then  the  circuit  a  for 

N 

f  also  has  N  =  nm  inputs,  and  the  output  nodes  of  only  give  some  prescribed 

finite  number  of  the  coefficients  of  (the  possibly  infinite)  power  series 
f (p^ (x) , . . . ,p  (x) ) . 

The  depth  of  circuit  a  is  the  length  of  its  longest  path.  A  partial  functio 

f  over  polynomials  or  power  series  in  c#?  has  simultaneous  dc:  th  0<D(N))  and 

size  0 (S  (N) )  if  there  exists  an  infinite  family  of  circuits  ct  , . . .  ,a  , . . .  and 

1  N 

constants  c,,c,M  such  that  VN  >  1 ,  a  has  depth  not  more  than  c,D(N)  and 

size  not  more  than  c^S (N)  and  given  N  input  coefficients  of  the  input  polynomial 

or  power  series,  computes  f  within  the  prescribed  number  of  coefficients. 

Let  a]/a2''..  be  3  family  circuits  over  =  (i/,+,',0,1)  where  £/  is 

countable.  Fix  some  enumeration  ci'C2'*'"  of  the  constants  in  £/.  We  assume 

each  circuit  cx  is  encoded  by  a  binary  string  where  the  binary  representation  of 

i  is  used  to  represent  each  constant  symbol  c,  labeling  a  node  in  a  .  (Thus, 

1  N 

for  example,  the  N-th  root  of  unity,  if  it  exists,  might  be  represented  by  a  binary 
string  of  length  log  N.)  The  circuit  family  , . . . . .  is  uniform  in  the 
sense  of  (Borodin,  77]  if  there  exists  a  logarithmic  spa  ft  deterministic  Turing 

Machine  which  given  any  N  >  0  in  unary  outputs  for  the  binary  encoding  of 

a  .  All  the  circuits  considered  in  this  paper  are  uniform  in  this  sense. 

N 


1 
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2 . 1  The  Discrete  Fourier  Transform 

Fix  a  commutative  ring  d?=  (2>,+ ,  *,0 , 1) .  We  assume  to  is  a  principle  N-th 
root  of  unity  in  Jt  and  that  N  has  a  multiplicative  inverse.  (For  example. 


2"v -L/N 


is  a  principle  N-th  root  of  unity  in  the  complex  numbers. 


Given  a  vector  a€  f3)  ,  the  Discrete  Fourier  Transform  is 


DFT  (a)  =  Aa 
N 

where  A  =  oj1"1  for  O^i,  j  <  N.  Then  A  ^  exists  [Aho,  Hopcroft,  Ullman, 

74,  o.  253],  where  A.’*’=  —  J  The  inverse  Discrete  Fourier  Transform  is 

L3  N 

-1  -1  -1 
DFT  (a)  = A  a  and  obviously  satisfies  DFT  (DFT  (a))  =a.  (Mote:  given  a 
.«  N  N  - 

vector  a  ec/1,  where  n < N,  DFT  (a)  will  be  defined  to  be  DFT  (a  )  where 

N  N 

a  is  the  vector  of  length  N  derived  by  concatenating  a  with  N-n  zeros.) 

[Cooley  and  Tukey,  65]  gave  the  Fast  Fourier  Transform  for  which 

THEOREM  2.1.  DFT  and  DFT  1  over  .rf  have  simultaneous  depth  O(log  N) 

N  N 

:(M  Log  N)  . 

Note :  the  assumption  of  the  n-th  root  of  unity  is  not  really  essential  to  our 

techniques,  since  in  general,  our  techniques  will  be  applicable  whenever  a  O(log  n) 
Jopth  circuit  exist  for  the  Discrete  Fourier  transform.  For  example,  Theorem  2.1 
obviously  applies  to  the  complex  numbers,  and  since  the  field  operations  over 
complex  numbers  can  be  simulated  over  the  reals  with  only 'a  factor  of  two  depth  increase 
Th  cr-m  2.1  also  applies  to  the  reals. 


2 . 2  Products  of  Polynomials 

7 oppose  we  are  given  m  vectors  a^£CZn  for  i  =  l,...,m.  Each  vector 
T 

a  =  (a.  ,,..., a.  ,)  gives  the  coefficients  of  a  n-1  degree  polynomial 

ii,0  i  ,n-l 


n-1 


Ajx)  -  £  j  xJ  *-n  •*(*]•  Let  N  =  nm.  We  wish  to  compute  the  product 


,=0 


M- 1  m 

k 


;  olymmal  3(x)  =  b  x  ,  where  B(x)  ’  T1  A.  (x)  .  (Note  that  we  have  b,  =  0 

K  ,  L  K 

*  =  0  i-l 
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whcre  the  last  turf.  in  each  expression  is  the  cost  of  the  depth  log  N  in 

constant  size  problems.  These  last  terms  are  bounded  by  0(log  n)  and 
3  +  j 

N  ,  respectively.  Summing  the  loglog  N  terms  in  each  expression  of 

(vi)  we  qet 

1  '8 

(vn)  D(N  ,N)  ?  (c+1)  log  N  loglog  N 

„  ,  1  /P  ^  3+d ,  , 

:-iN  ,  M  ^  N  loglog  N 


'•tituting  (vii)  into  (iii),  we  get 


iv;  u  !  D(r.,N)  ^  0(  log  (m)  loglog  N  +  log  mN) 


,  ...  ,  ,.,0(1) 
■  ( m , N )  =  ( mh ) 


.it.  :  t.v.  t;.e' rer  is  }.  roVcri .  u 

3. 3  Multiprecision  Evaluation  of  Polynomials  and  Power  Series 

Let  p(x)  be  a  real  polynomial  or  a  real  power  series  with 
n-1  given  rational  coefficients  of  magnitude  <  2n.  We  wish  to  evaluate  p  (x) 
at  a  floating  point  real  xQ  within  accuracy  o(2  n) .  Theorem  3.3 


COROLLARY  3.1.  The  t valuation  of  p(x)  at  a  given  xQ  to  accuracy  o(2  ) 


L :  ih  0(log  n  (loglog  n))  rnd  size  n 


0(1) 


The  elementary  functions  exp(x),  log(x),  sin(x),  cos (x) ,  arctan(x), 
square  root(x),  etc.  have  Taylor  series  expansions  convergent  within  accuracy 
o(2  n)  over  fixed  intervals. 
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arbitrary  in  clearly  follows  from  this  solution  via  a  single  application 
of  reduction  (iii).  The  method  of  attack  is  to  use  reductions  (i)  and  (iil 
alternately  to  reduce  the  problem  to  a  smaller  one  of  the  same  type. 


Reduction  of  N:  Apply  the  DFT  reduction  (i)  twice  and  then  reduction  (ii) 
(iv)  D(N1/8,N)  s£  D(N1//8,t(N1'/2) )  +  Q(log  N) 


<  2D((N1/2)1/8,t{N1/2))  +  O(log 


N) 


and 


S(B1/6,N.  < 

N3/2S { (K1/‘) 8 ,t (K1 7“) )+ B' 


So  for  sufficiently  large  N  and  some-  fixed  c,d 

.1/2, 


(v) 


D(N1/6,N)  <  2D((N1/2)1/6,t(N  '  ))  +  C log  K 


S(N1/8.N)  SN3/2s((K1/2)1/e,t;N1'', 


) )  +  N 


1/2 


The  original  problem  of  size  N  has  been  reduced  to  problems  of  size  N 
These  reductions  must  be  applied  ^  logloq  N  times  until  the  problems  are  of 
constant  size.  Analysing  (v)  carefully  by  exj anding  out  terms  we  get 

(vi)  D (N^  8 , N)  ^  c  log  N  +  2c  log  ^  +  2^c  log  ^  + 


loqlog  N  2 

2  3  c  log  N 


■looloq  N 


S(N1/8,N)  Nd  +  N3/2+d  +  N3/2+3/2(l/2)+d  +...+ 


,  ,  ,  ,,,,-loaloa  N, 

3/2+3/2(l/2)  +•••+  3/2(2  -  -  )  +a. 


Thus 
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PrjXJf:  Given  a  list  of  N-bit  integers  a,,..., a  we  compute 

m  N  1  m 

the  product  n  a.  mod(2  +1) .  Let  the  boolean  depth  and  size  required 

i=l  1 

to  compute  this  product  be  D(m,N)  and  S(m,N)  respectively.  Let  t(x) 
be  the  largest  power  of  two  less  than  x.  Using  this  notation.  Lemma  3.1 
leads  to  the  following  recurrences 

(i)  D(m,N) <  D(m, t( (mN) 3  3 )  )  +  O(log  mlJ) 

S(m,N) <  (mN)3/5S(m, t((mN}3/5))  +  (mN)°(1) 


(Note:  slightly  tighter  recurrences  can  be  obtained  from  Lemma  3.1, 

but  this  dots  not  significantly  affect  the  asymptotic  analysis.) 


Reduction  of  m:  When  m  * N 

“ ,1/el 


1/8 


of  size  at  most 
I roduct  of  all  the 


N 


m 


group  the  m  input  integers  into  blocks 
and  compute  the  products  for  each  block.  Then  compute  the 
blocks.  To  avoid  worrying  about  the  ceiling 


function  in  decribing  the  number  of  integers  in  each  of  these  products 

N 

first  perform  a  single  multiplication  of  two  integers  mod2  +1  to 


reduce  this  number  by  one.  Thus, 

(ii)  D(m,NX  D(N1/S,N)  +  D ( 


N 


1/8 


, N)  +  O(log  mN) 


(0-*’ 


3(N1/8,H)  +  S(-^  ,N)  +  (mN)  0  ( 1  ’ 


,1/8 


1/8 


Continuing  this  process  recursively  results  in  an  N  -ary  tree  of  multipli¬ 
cation  nodes,-  so  the  desired  product  may  be  computed  using  sub-circuits 
which  compute  products  of  only  N  '  integers.  This  tree  has  depth  of  at  most 


81og  m 

log  N 


and  certainly  has  fewer  than  m  nodes.  It  follows  that 


(iii)  D(m, N)  ^ 


81og  m 
I  log  d 


1/8 

D(N  , N)  +  O(log  mN) , 


1/8 


S  (m,N)  ^  mS(N  , N)  +  (mN) 


,0(1) 


1/8 


It  is  now  possible  to  consider  the  problem  for  m^N  .  The  solution  for 
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The  reduction  is  correct  if  n  is  a  power  of  two  and  furthermore  the 
coefficients  of  B(x)  are  small,  that  is  if  |b^|  <  £-.  Applying  the  below 

Proposition  3.2  we  can  ensure  |b^  |  <  2n  ^  by  having  N^16,  m^N*^2 

1/2 

and  choosing  n  to  be  the  largest  power  of  2  less  than  16 (mN)  .  o 


PROPOSITION  3.2.  For  each  j  =  0,...,n-l,  the  magnitude  of  the  coefficients  of 

.  ,  ,  .  ,  | ,  |  ,  _2m (£+l+log  n) 

fc(x)  zs  gzven  by  [ b .  |  <2 


Proof.  Let  f(i)  be  the  maximum  magnitude  of  any  coefficient  of  a  polynomial 

resulting  from  a  product  of  21  of  the  A^  (x)  polynomials  taken  mod(x  +1). 

£  2 

Clearly  f(0)<2  and  f(i)<2n  f(i-l)  for  i  >  0.  The  general  solution  of 

2  2i-l  21  i 

the  recurrence  S .  = cS .  is  S.  =  c  S-  .  Setting  S-  =  2  and  c  =  2n 

l+l  l  .  l  0  0 

.  21-1_21£<  £2i+2i-l+(21  +  l)log  n.  Hence, 

we  have  f  (i  )  ^  (2n)  2  ^2 

....  .  -  2m(£+l)-l+(2m-l)log  n <  _2m (i+l+log  n) 

f  ( 1  log  m 1  )  2  z  .  u 


The  key  idea  of  the  Theorem  3.3  is  that  when  m>  N  ,  the  a^  are  grouped 

into  blocks  of  size  <m  and  the  product  circuit  is  applied  to  these  smaller 

1/P 

blocks,  thus  reducing  m  relative  to  H.  When  m^N  "  our  DPT  reduction  of 
Lemma  3.1  is  applied  to  decrease  N  relative  to  m.  In  our  original  construction 


[Reif,  83]  we  required  Odoglog  N)  applications  of  Lemma  3.1  to  accomplish 
this  decrease  of  N.  [Beame,  Cook,  and  Hoover,  84]  suggested  an  improvement 
which  requires  only  a  constant  number  of  applications  of  our  DPT  reduction 
to  appropriately  reduce  N.  We  give  this  improved  version  below,  with  their 
kind  permission  . 


THEOREM  3.3.  For  N  a  power  of  two,  the  product  of  m  N -bit  integers 
mod  2N+1  has  boolean  depth  0(log  (m)loglog  N+log(N))  and  size  (mN)0(1) 
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(2)  We  intend  to  take  DFT  with  u>  =  4,  \p  =  2  and  p  =  wn^+l*  2n+l.  Associate 

n 

with  each  a.  a  coefficient  vector  a.  defined  by 

l  l 

^  _  ,  .  ,n-l  . T 

a.  =  (a.  ,lpa  , . . .  ,ip  a.  , ) 
l  i,0  1,1  i,n-l 


(3)  Compute  in  parallel 


DFT  (a.)  =  g.  =  (g.  0,...,g.  .) 

n  i  l  i,0  i,n-l 


(4)  Compute  product 


m 

e,  =  n  g.  .  modp 

k  ,=1  i,k 

(5)  Compute 

DFT‘1((en . e^  .)T)  =  b  =  (bjb  . )T 

n  0  n-i  0  1  n-1 

to  obtain  the  coefficients  of  the  product  polynomial 

n-1 

f  (x>  =  £  t  y? 

3  =  0  J 

w:.i  r--  by  Lemma  2.3,  b  =  B  ( 2  ). 

(6)  Evaluate  6(2^)  to  qet  t. 

Since  is  a  power  of  two,  we  can  easily  extract  each  b.  from  ^  ^b.  by 

3  3 

bit  shifting.  By  Theorem  3.1,  the  DFT^  and  DFT"1  computations  have  depth 
O ( log  n).  Thus  all  of  these  computations  have  depth  O(log  N  + log  2+ log  n)  * 
Odog  mN)  except  possibly  computing  the  ek  modular  product  in  step  (4).  Note 
that  we  can  use  the  identity  2nx  5  (2n+l  -  x)  mod  2n  +  1  to  simplify  the 
computation  of  e^  to  the  product  of  at  most  m  numbers,  each  of  n  bits. 

Thus  the  depth  D(m,N)  of  the  resulting  circuit  satisfies 

D(m,N)  =  D(m,n)  +  Odog  mN) 


.  .1 


.  3 
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see  [Savage,  763)  yielding  a  boolean  circuit  of  depth  O(log(n  log  p))  and  size 
0(n  log  p  log(n  log  p) ) .  Thus  we  have 

THEOREM  3.1.  DFT  and  dft-^  over  the  ring  7L  have  simultaneous  boolean 
n  n  P 

depth  0  (log  (n  log  p))  =0(log  n)  and  size  0(n  log  p  log  (n  log  p) ) . 

3.2  Products  of  Integers 

[Schonhage,  Strassen,  71]  have  shown: 

THEOREM  3.2.  The  product  of  two  N -bit  integers  has  simultaneous  boolean  depth 
O(log  N)  and  size  0(N  log  N  loglog  N). 

N 

We  now  prove  that  for  N  a  power  of  two,  the  modulo  2  +1  product  of  m 
integers,  each  of  N-bits  has  boolean  depth  0 ( log (Nm) loglog  N) .  (Note  that  the 
naive  method  of  repeated  squaring  by  Theorem  3.2  results  in  a  boolean  circuit  of 
depth  0 (log (m) logN) . )  We  begin  with  a  key  lemma  which  reduces  the  number  of 
bits  of  the  integers  to  be  produced. 

LEMMA  3.1.  (DFT  Reduction)  For  N  a  power  of  Poo,  mN  sufficiently  large, 
and  any  m  <  N1^2  the  product  modulo  (2Ntl )  of  m  integers  each  n  bite  lory: 
can  be  computed  in  o(log  mN)  additional  boolean  depth  and  (mN)0(1)  additional 
gates  after  computing  n=0(mN)1//2  products  modulo  (2n+l)  each  of  m 
integers  each  n  bits  long.  a 


Proof.  Let  a,,..., a  be  a  list  of  N-bit  numbers.  We  wish  to  compute 
-  1  m 

m  N 

b  =  TI  a. mod  (2  +1)  . 

i=l  1 


u 

(1)  Since  N  =  2  for  some  integer  u,  we  can  block  each  N-bit  a..  into  n 

N 

(n  a  power  of  2)  chunks  a.  a.  ,  of  £=—  bits  each  so  that 

i,0  i,n-l  n 


n-1 


a.  a  : 

1  fa 


tj 


where  0  <  a^  <  2  .  Define  the  associated  polynomial 


n-1 

A.  (x)  =  /  .  a.  .x3 

1  £o 


and  observe  a^=A^(2  ) 
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3.  INTEGER  COMPUTATIONS 

3. 0  Boolean  Circuits 

We  consider  computations  over  integers  given  as  n  bit  binary  numbers,  and 
reals  over  [0,1]  given  within  accuracy  2  n.  Our  computational  model  in  this 
section  is  the  booztuzn  Oirjuzt,  defined  as  usual.  The  l-th  input  node  of  a 

n 

takes  the  i-th  bit  of  the  encoding  of  the  input  integer  or  real.  Each  gate  of 
a  computes  a  boolean  operation  v,  a,  or  1 .  Each  output  node  provides  a  bit 
of  the  encoding  of  the  computed  inteo-_r  or  real.  (In  the  case  of  reals  with 
floating  point  representation,  we  only  provide  the  input  and  output  bits  up  to 
some  finite  prescribed  accuracy.) 

3. 1  The  DFT  over  an  Integer  Ring 

We  assume  n  and  u>  are  positive  powers  of  two.  Let  p = wn^2 + 1  and 

let  ZZ  be  the  ring  of  integers  modulo  p. 
i? 

PROPOSITION  3.'-  In  ,  go  is  a  r  rind  pic  nth  root  of  unity  and  n  has  a 
rruttiilioativo  inverse  modulo  p. 

Proposition  3.1  implies  that  DPT  and  DFT  1  are  well  defined. 

n  n 

The  fast  Fourier  transform  computation  of  [Cooley  and  Tukey,  65]  yields  an 

arithmetic  circuit  a  of  depth  O(log  n)  and  size  0(n  log  n)  computing  DFT 

n  n 

whose  elements  require: 

(i)  addition  of  two  rlog(p)’-bit  integers. 

(ii)  multiplication  of  a  rlog  (p)  '•-bit  integer  by  a  power  of  u>. 

We  wish  to  expand  a  into  a  boolean  circuit.  Since  to  is  a  power  of  two, 

n 

the  multiplications  can  be  implemented  by  the  appropriate  bit  shifts  (i.e.,  the 
gate  connections  are  shifted  by  the  appropriate  amount) .  The  additions  can  be 
implemented  by  Carry-Save  Add  circuitry  of  [Ofman,  62]  and  [Wallace,  64]  (also 


We  now  show  that  Theorem  2.3  and  Corollary  2.4  imply: 


COROLLARY  2.7.  The  reversion  of  a  rest  pov~.r  scries  has  O(loc  n) . 

00 

Proof .  Let  A(x)  =  ^  a.x1  be  a  real  power  series  where  a  =  0  and  a  =  1. 

i=0  1  01 

oo 

The  reversion  of  A(x)  is  the  power  series  R(z)  =  52  r.z  where  z  =  A(x) 

k=0  k 

iff  xsR(z).  Note  that  rQ  =  0  and  r^  =  l.  For  the  kth  coefficient,  we  first 
compute 


CO 

B(x)  =  — —■  =  52  K*1 


and  then  apply  Lagrange's  reversion  formula  [Lagrange,  1768]  r  = b  , /k  for 

X  X  —  - 

k>2.  Thus  Theorem  3.3  implies  Corollary  2.7.  □ 
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Thus  to  compute  the  coefficients  of  q (x) ,  r(x)  we  compute  the  first 

w1 

n^-n  +1  coefficients  of  A(z)/B(z)  =Q(z)  +o(z  ),  then  compute  the  power 

w1 

series  A(z)  -B(z)Q(z)  =  z  R(z),  and  finally  output  the  coefficients  of 

Q(z),  R(z).  □ 

COROLLARY  2.6.  Interpolation  cf  a  ; '  :  .  :eoc  -depth  O(log  n) . 

P roof .  Suppose  we  are  given  real  polynomials  p ^ ( x ) , . . . ,p  (x)  each  of 

degree  n-1,  and  real  polynotrials  (x)  , .  . .  ,q^(x)  where  degree  (q .  (x )  )  <  degree  (p .  (x) ) 

m 

for  i  =  l,...,n.  Let  P  (x)  =  Cl  p.  (x)  .  The  Chinese  Remainder  Theorem  states 

i-1  1 

that  there  is  a  unique  polynomial  Q(x)  of  degree  less  than  that  of  P(x)  such 
that  Q  (x)  =  q  (x)  mod  p  (x)  for  i  =  1 , . . . ,m. 

The  Lagrangian  interpolation  formula  gives 
m 

Q(x)  5  q.  (x)r.  (x)s.  (x)  modP(x) 

1  =  0 

where  s.  (x)  =  P(x)/p.  (x)  and  r.  (x)  is  the  multiplicative  inverse  of 
ill 

s  .  (x)  mod  p.  (x)  . 
i  i 

Theorem  2.2  and  Corollary  2.5  imply  that  preconditioned  Chinese  remaindering, 
with  the  r^ (x) , . . . ,rm (x)  also  given,  has  depth  O(log  n) . 

However,  in  the  special  case  p^ (x)  =  x-a^  for  i  =  l,...,m,  where  the  a^ 
are  distinct  then  each  r^ (x)  = 1/s^ (x)  can  be  computed  in  parallel  by 
Theorem  3.3  and  Corollary  2.5  in  depth  O(log  n).  In  this  case  the  q^  (x)  = 
are  constants,  since  they  must  have  degree  less  than  the  p^  (x) . 

Further  note  that  in  this  case  Q(x)  is  the  unique  polynomial  such  that 
Q(a^)=b^  for  i  =  l,...,m.  Thus  we  have  proved  Corollary  2.6.  □ 


-«  -•  ^  -a  — 


»  '  .  >  «  -  .  - 


An  alternative  method  using  the  lemma  below  results  in  a  circuit  of  depth 


O(log  n)  with  smaller  circuit  size. 


LEMMA  Z.k.  If  i(z) 


n+1. 
0  (z  ) 


,  n+1, 
o(z  ) 


log  (nA-l)-l  i 

n  (1+(1-A(z)>  )  then  |l  (z)  - 1  (z) j  = 

i=0 


Proof.  Let  B  (z )  = 1-A (z ) .  Then  A (z) I (z)  =  (1-B (z) ) i (z)  = 1-B (z) n+1 = 1- (1-A (z) ) 
So 


1 1 (z)  -  I  (z) 


(1-A  (z)  )  n-f  1 
A  (z ) 


=  o(zn+1) 


.  n+1, 
o(z  ) 


o 


COROLLARY  2.5.  Given  real  polynomials  a(x),  b(x)  cf  dt.prnc  a:  mcc: 

n,  we  can  compute  in  depth  o(log  n)  the  unique  polynomials  q(x),  r(x)  such, 
that  a  (x)  =  q  (x)b  (x)  +  r  (x)  and  degree  ( (r  (x)  )<  degree  (b  (x) )  . 

Proof.  (Also,  see  [Knuth,  81]).  Let  n^  =  degree (a  (x) )  and  =  degree  (b (x) ) . 
The  computation  is  trivial  unless  n^^n  >1.  Then 

n  -n  +1 

A (z)  =  Q (z) B  (z)  +  z  R(Z) 


where 
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given  intervals.  Thus  by  Corollary  2.1  we  have 


COROLLARY  2.2.  The  elementary  functions  on  afttx]]  have,  depth  O(log  n)  . 

For  some  given  x^,...,xN  €2^  it  is  frequently  useful  in  algebraic  compu¬ 


tations  to  determine  the 


polynomial  TT  (y-x .  )  =  ^  (-l)3p.y3 

i=l  1  j=0  3 


whose  coefficients 


p.  =  E.  <.  <.  x.  ...x  are  the  elementary  symmetric  functions.  It  was 

3  11  12  1j  xl  1j 

pointed  out  to  us  by  Les  Valiant  that  Theorem  2.3  immediately  implies 

COROLLARY  2.3.  The  elementary  syrr.-,tria  functions  in  afttx] ]  have  depth 
0 (log  N) . 

2.5  Division,  Interpolation  and  Reversion 
n-l 

Let  A(z)  =  a.z  bo  a  real  :  >wcr  series  where  a  =1. 

iTo  1 

CO 

The  reciprocal  of  A(z)  is  the  power  series  I  (z)  =  5^  r.z1  such  that 

i=0  1 

A(z)*I(z)  =1.  I  (z)  has  the  infinite  series  expansion 

00 

I  ( z )  =  £  (l-A(z))1 
i=0 

We  wish  to  compute  the  first  n  coefficients  of  I(z).  Since 
n-l 

Li  n 

(l-A(z))  +o(z  ),  we  have  by  Theorem  2.3: 

i=0 


COROLLARY  2.4.  The  first  n  terms  of  the  reciprocal  of  a  real  pacer  series  an . 
th  division  of  two  real  power  series  can  be  computed  in  depth  o(log  n) . 
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~  T 

Now  let  DFT^(d)=  (e^, . . .  je^^)  •  Then  for  k=0,...,n-l  we  get 
■  v*1  j 

K  =  2-  d0^  w 

£=0  * 


n-1  m-1 


£  E  *“e(-Drb 

£  =  0  r=0 


by  Lemina  2 . 2 


n-1  m-1 


■  E  E 

Uj  r*0 


V  n  a  . 

^  „  i=l  ^i 


0<j , . . . j  <n 
1  m 

nr+P.=I  j  . 


But  if  we  substitute  £  =  (E^_^  j  )  -  nr  into  the  above  expansion,  we  get 


(  «,  r  k<:V 

V  k.  (  —  1)  —  V  u. 


nr  .  ,.r  .  n 

since  ip  ~  (-1)  and  Ou  =  1.  Hence  e’  =  e,  . 

k  k 

The  above  Lemmas  2.2,  2.3  and  Theorem  2.1  imply: 

THEOREM  2.1*.  The  modular  product  (A^  (x) . .  .A^  (x)  )mod  (xn+l)  of  polynomials 

A.(x),...,A  (x)  in  ^?[x]  of  decree  n-1  has  simultaneous  depth  O(loa(nm))  an. 
i  m  *  * 

size  O  (nm  log  (nm)).  The  modular  power  A(x)mmod  (xn+l)  of  a  single  polynomial 
A (x)  of  degree  n-1  has  simultaneous  depth  O(log(nm))  and  size  0(n  log (nm) ) . 


2.1*  Elementary  Functions  of  Power  Series 


An  immediate  consequence  of  Theorem  2.3  is 


COROLLARY  2.1.  The  composition  of  two  power  series  in  ^?[Ix)]  has  depth 
O(log  n). 


The  elementary  functions  exp(x),  log  (x) ,  sin(x),  cos (x) ,  arctan(x),  and 
square  root(x),  etc.  all  have  known  Taylor  series  expansions  convergent  over 
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2 . 3  Modular  Products  of  Polynomials 


m 

Let  B(x)  =  n  A.  (x)  be  the  product  polynomial  considered  in  the  previous 
i=l  1 


section.  Here  we  consider  the  computation  of  the  'nodular  product  D(x) 


hi  1 


where  D(x)  =  B(x)  mod  (x  +1) 


m-1 

LEMMA  2.2.  The  coefficients  o'  D(x)  are  d.  =  V  (-l)rb  . 

J  •  l  nr+i 

r=0 

i =  0, . . . ,n-l. 


tor 


P  roof . 


N-l  m-1  N-l 

B  (x )  =  Y'  b .  x^  =  b  > 

H  nr+i 


nr+i 


j  =  0 


r=0  i=0 


m-1 


r=0 


.  , . r  _  nr  .  .  n  , . 
since  (-1)  -  x  mod  (x  +1)  . 


E(-l)rb  x1  mod  (xn+l) 
nr+i 


We  assume  u.  is  principle  n-th  root  of  unity  in  eft  and  n  has  a 

2 

nultij  licative  inverse.  We  also  assume  there  exists  an  ^  such  that  li1  =  uj 

and  -;n  =  -l.  Let  a.  =  (a.  „,^a.  .  , . . .  ,^n_1a .  .)T.  The  r.eaatively  wrapped 

i  i,0  i,l  i ,n-l 

convolution  of  a,,..., a  is 
l  m 

a  =  (dQ,^d1, . . . , 1dn_1)T 


LEMMA  2.3.  a  =  DFT_1(DFT  (a,  )  .  .  .  DFT  (a  )). 

n  n  1  n  m 


Proof .  For  i=l,...,m  let  DFT^aL)  =  (g^  Q,. 


• ,g.  „  ,)  where 
i  f  n-l 


n-l 

»i  k  =  L  ai 
1,K  j=0 


j,,jk 


, . . . ,n-l.  Let 


w  ge*)  ■  z  *  1  U  ’ey) 

r  ,. . . ,Dm<n  ' 

1  m 


for  k  =  0 
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I 


for  N  -  m  +  l^k^N  -  1.) 

In  the  special  case  m  =  2  and  N  —  2n,  the  convolution  vector 

b  =  (b  , . . .  ,b  )T=a  ®  a_  gives  the  coefficients  of  B(x).  By  the  Convolution 
0  N— 1  12 

Theorem : 

a  ®a  =DFT-1(DFT  (a  )  »DFT  (a  ) )  where  •  denotes  parrwzse  proauc: 

12  N  N  1  N  2 

Hence  the  well-known  result: 

THEOREM  2.2.  The  product  of  two  polynomials  in  affix)  of  decree  n-1  nar 
simultaneous  depth  O(log  n)  and  size  0(n  log  n) . 


in  the  case  of  general  m>2,  we  wish  to  compute  the  coefficient  vector 


b 


'Vi1 


a,  ©  ...  ®  a 

1  m 


By  repeated  application  of  the  Convolution  Theorem,  we  get 
LEMMA  2.1.  b  =  DFT^,1  (DFTN(a1)  ..  .DFTN(am)  )  . 

First  in  parallel  for  i  =  l,...,m  compute  f^DFT^aJ,  where 

f  =  (f  ,...,f  ,)T.  Next  we  compute  in  parallel  for  j  =  l,...,N  the 

i  i,0  i,N-l 

m  -IT 

elementary  products  F.  =  TI  f.  .  •  Finally,  we  compute  DFT^  <  <Fq,  •  •  • ' 1  >  1# 

3  i=l  '3 

Since  the  computation  of  DFT^,  DFt"1  and  the  required  products  Fj(  each 
have  depth  O(log  N) ,  we  have: 

THEOREM  2.3.  The  product  of  m  polynomials  in  £?[x]  of  degree  n-1  has 
depth  O(log (nm) ) . 

Note  that  in  contrast,  the  naive  method  of  repeated  producting  by  Theorem  2.2 
has  depth  (log (m) log (n) ) .  Also  note  that  since  Theorem  2.1  applies  to  the  real 
polynomials  so  do  Theorems  2.2  and  2.3. 


COROLLARY  3.2.  The  evaluation  of  an  elementary  function  over •  a  fixed  interval 
with  a  Taylor  series  expansion  convergent  to  accuracy  o(2~n)  has  boolean 
depth  O (log  n (log log  n))  and  size  n0*1*. 

COROLLARY  3.3.  The  elementary  symmetric  functions  (see  Section  2.4)  o;\  .*• 
reals  have  boolean  depth  O(log  ndoglog  n))  and  size  n°^' 

3.1*  Reciprocals  and  Division  of  Integers 

Let  a  be  an  integer  within  bounds  2n  2n.  Then  a  has  a  binary 

n-1 

representation  V*  a.21  where  a  =1.  The  reciprocal  of  a  is  2~(n~1)r, 

i-o  1  ^ 

oc 

where  r=  r^2  1 .  We  with  to  compute  the  first  n  bits  , . 

For  this,  we  can  use  the  product  form,  of  [Anderson  c:  a'..,  67]  and  [Savage,  76, 
p.  256]. 

log(n+l)-l  i 

LEMMA  3.3.  If  i=  n  (l+(l-2-na)  )  then  \  r-r  =cC“r‘). 

i=0 

By  Theorem  3.3  and  the  above  lemma, 

COROLLARY  3.^*.  The  reciprocal  can  be  computed  within  accuracy  o{2  T‘)  ty  a 
boolean  circuit  of  depth  O(log  ndoglog  n)  )  and  size  nC<1>. 

COROLLARY  3.5.  Given  integers  a,  b  with  binary  representation  containin': 
n  bits.  We  can  compute  in  boolean  depth.  O(log  ndoglog  n)  )  r •. 
quotient  q  and  remainder  r  integers  suck  that  a  =  qb  +  r  and  0<r<b. 


Further  Work  and  Open  Problems 


A  subsequent  paper  of  (Beame,  Cook,  and  Hoover,  83)  gives  O(log  n(log*n)) 
depth  boolean  circuits  for  taking  the  product  of  n  integers  and  integer 
division.  These  circuits  are  nonuniform,  in  the  sense  of  [Borodin,  77)  since 
their  construction  requires  polynomial  time  and  at  least  linear  uniform  depth. 

It  remains  an  open  problem  to  find  a  uniform  circuit  of  O(log  n)  depth 
for  integer  division. 

Also,  the  circuit  depth  complexity  of  the  following  problems  remain  open: 
given  integers  a,b  such  that  0  <  a,b  <  2°, 

(1)  compute  ab  mod  2n 

(2)  compute  the  greatest  common  divisor  of  a  and  b. 

The  obvious  circuits  for  these  problems  have  C(n  log  n)  depth.  If  we 
use  our  improved  techniques  for  integer  products  described  in  this  paper,  this 
depth  bound  is  reduced  by  a  factor  of  OUloglog  n) /log  n)  .  We  conjecture  that 
no  (log  n)°^  depth  constant  degree  circuits  exist  for  the  above  problems. 
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